{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluating accuracy of a model using calculations\n",
    "After you train a model, you need to get a sense of it's accuracy. The accuracy of a model gives you an idea of how much confidence you can put it predictions made by the model.\n",
    "\n",
    "The **scitkit-learn** and **numpy** libraries are both helpful for measuring model accuracy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's start by recreating our trained linear regression model from the last lesson"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.linear_model import LinearRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load our data from the csv file\n",
    "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') \n",
    "\n",
    "# Remove rows with null values since those will crash our linear regression model training\n",
    "delays_df.dropna(inplace=True)\n",
    "\n",
    "# Move our features into the X DataFrame\n",
    "X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]\n",
    "\n",
    "# Move our labels into the y DataFrame\n",
    "y = delays_df.loc[:,['ARR_DELAY']] \n",
    "\n",
    "# Split our data into test and training DataFrames\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "                                                    X, \n",
    "                                                    y, \n",
    "                                                    test_size=0.3, \n",
    "                                                    random_state=42\n",
    "                                                   )\n",
    "regressor = LinearRegression()     # Create a scikit learn LinearRegression object\n",
    "regressor.fit(X_train, y_train)    # Use the fit method to train the model using your training data\n",
    "\n",
    "y_pred = regressor.predict(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Measuring accuracy\n",
    "Now that we have a trained model there are a number of metrics you can use to check the accuracy of the model. \n",
    "\n",
    "All these metrics are based on mathematical calculations, the key take-away here is you don't have to calculate everything yourself. Scikit-learn and numpy will do most of the work and provide good performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Mean Squared Error (MSE)\n",
    "The MSE is the average error performed by the model when predicting the outcome for an observation. \n",
    "The lower the MSE, the better the model.\n",
    "\n",
    "MSE is the average squared difference between the observed actual outome values and the values predicted by the model.\n",
    "\n",
    "MSE = mean((actuals - predicteds)^2) \n",
    "\n",
    "We could write code to loop through our records comparing actual and predicated values to perform this calculation, but we don't have to! Just use **mean_squared_error** from the **scikit-learn** library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mean Squared Error: 2250.4445141530855\n"
     ]
    }
   ],
   "source": [
    "from sklearn import metrics\n",
    "print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Root Mean Squared Error (RMSE)\n",
    "RMSE is the average error performed by the model when predicting the outcome for an observation. \n",
    "The lower the RMSE, the better the model.\n",
    "\n",
    "Mathematically, the RMSE is the square root of the mean squared error \n",
    "\n",
    "RMSE = sqrt(MSE)\n",
    "\n",
    "Skikit learn does not have a function for RMSE, but since it's just the square root of MSE, we can use the numpy library which contains lots of mathematical functions to calculate the square root of the MSE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Mean Absolute Error (MAE)\n",
    "MAE measures the prediction error. The lower the MAE the better the model\n",
    "\n",
    "Mathematically, it is the average absolute difference between observed and predicted outcomes\n",
    "\n",
    "MAE = mean(abs(actuals - predicteds)). \n",
    "\n",
    "MAE is less sensitive to outliers compared to RMSE. Calculate RMSE using **mean_absolute_error** in the **scikit-learn** library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Mean absolute error: ',metrics.mean_absolute_error(y_test, y_pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# R^2 or R-Squared\n",
    "\n",
    "R squared is the proportion of variation in the outcome that is explained by the predictor variables. It is an indication of how much the values passed to the model influence the predicted value. \n",
    "\n",
    "The Higher the R-squared, the better the model. Calculate R-Squared using **r2_score** in the **scikit-learn** library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('R^2: ',metrics.r2_score(y_test, y_pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Different models have different ways to measure accuracy. Fortunately **scikit-learn** and **numpy** provide a wide variety of functions to help measure accuracy."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
